A comprehensive procedure to develop water quality index: A case study to the Huong river in Thua Thien Hue province, Central Vietnam

This work proposed a novel procedure of Water Quality Index (WQI) development that could be used for practical applications on a local or regional scale, based on available monitoring data. Principal component analysis (PCA) was applied to the monthly data of 11 water quality parameters (pH, conductivity (EC), total suspended solid (TSS), dissolved oxygen (DO), five -day biological oxygen demand (BOD), chemical oxygen demand (COD), ammonia (N-NH4), nitrate (N-NO3), phosphate (P-PO4), total coliform, and total dissolved iron monitored at 11 sites at Huong river in the years 2014–2016. From the PCA, the three extracted principal components explained 67% of the total variance of original variables. From the set of communality values, the weight (wi) for each parameter was determined. Linear sub-index functions were established based on the permissible limits from the National Technical Regulations on Surface Water Quality set up by the Vietnam Environment Agency (VEA) to derive the sub-index (qi) for each parameter. The multiplicative formula that is the product of the sub-indices (qi) raised to the respective weights (wi), was used for calculation of the final WQI values. The proposed index (WQI) was then applied to the river with quarterly data of the 11 parameters monitored at ten sites in the years 2017–2020. The WQI representatively reflected the actual status of the river overall water quality, of which 97.8% of the WQI values belonged to grades of EXCELLENT and GOOD, and 2.2% of grade MODERATE. Comparison between the river water quality evaluations resulting from the developed WQI with the WQI adopted by National Sanitation Foundation (NSF-WQI) and the index issued by Vietnam Environment Agency (VN-WQI) indicated that the proposed WQI was more suitable for river quality assessment.


Introduction
Water quality is important information in water resources management. Different uses of water need various water quality parameters consisting of physical, chemical, and biological ones. For the water quality assessment, water quality standards or guidelines have been established on international and regional scale. However, they provide evaluation taking individual parameters into account and do not indicate a general picture of the water quality in sites or regions under study [1][2][3][4][5]. The development of water quality assessment methods based on a quantitative and comprehensive index has attracted big concerns from scientists. Water Quality Index (WQI) is a mathematical tool to transfer water quality parameters to a single integer value, depicting the overall health status of a water body [2,[6][7][8]. The WQI developed by Brown et al. [1] was proposed by National Sanitation Foundation (NSF-WQI) to assess surface water quality. The NSF-WQI has been applied worldwide as originally proposed or modified before applications [2,[9][10][11]. Many reviews about developed WQIs [2,4,5,9] indicated that WQIs has been widely used as an efficient tool to assess surface and underground water quality. According to the reviews mentioned above, the remarks were extracted as follows [4,10,12]: (i) although many WQIs are available, there is still a need for an overall WQI that can incorporate the available data and describe the water quality for different uses; (ii) significant discrepancies were observed in the course of water quality classification from different methodologies; (iii) the most challenging aspect is that WQIs are developed for a specific region, being source-specific; therefore, there is a continuing interest to develop accurate WQIs that suit a local or regional area; (iv) no single WQI has been globally accepted; (v) there is no worldwide accepted method guiding steps for WQI development, thus, further works in this fields are still necessary to solve the limitations of worldwide developed WQIs. These conclusions indicate a desire to develop a method and a water quality index for practical applications on local or regional scale, based on available monitoring data.
The aim of establishing a WQI is to transform the concentrations of selected water quality parameters (or variables) with different units and dimensions into sub-indexes with dimensionless scale, defining subindices, and choosing an aggregation method to generate the numerical value for the index [2,4,10]. The general procedure to create a WQI consists of the following steps [2,4,5]: (i) selection of water quality parameters; (ii) computation of sub-index values through a transformation of the parameters to a standard scaling factor; (iii) estimation of weights for all parameters; (iv) aggregation of the sub-index values and weights to obtain the final WQI.

Selecting parameters
Based on a review of 30 existing WQIs, the parameters selected to calculate WQIs were divided into three types: fixed, open, and mixed systems [4]. The most of those WQIs have used a fixed set of parameters that is commonly called "basic" as the selected parameters are the most significant ones for water quality evaluation in the study site or region [1,2,[12][13][14][15][16][17][18]. The fixed system (e.g. NSF-WQI with 9 parameters), allows users to compare water quality status among the sites or rivers, but not to add the new parameter(s) needed for assessment of water quality [19]. Some WQIs use an open system that has no guidelines for the selection of parameters, for example, the WQI developed by Canadian Council of Ministers of Environment [20]. This system causes difficulty in comparisons among monitored sites and among river basins [21]. The mixed system consists of the basic and additional parameters. The selection of additional parameters incorporated into WQI calculation is depended on their sub-index values or importance in river water quality reflection [13]. Many studies indicated that the objective (less subjective) way to select parameters for the development of a WQI is based on the results obtained from statistical analysis of available monitoring data, such as correlation analysis, multivariate analysis technique: principal component analysis/PCA, factor analysis/FA [2][3][4][22][23][24]. The issues mentioned above, relating to parameter selection for WQI development, indicate that a mixed system should be chosen to avoid 'rigidity' and the parameters selected should be ones monitored routinely, of great importance in reflecting river water quality.

Defining sub-indices
This step aims to transform concentrations of selected water quality parameters into a standardized or common scale without unit, typically within identical range, i.e. 0 (poorest) -100 (best) or 0 (poorest) -1 (best), called sub-index [2]. To define sub-index value, WQI developers have established the sub-index functions or rating curves of different parameters [4,9]. There are three methods that are usually employed: (i) expert judgment such as the NSF-WQI [1], Oregon Index [12], and Almeida's Index [18]; (ii) use of the water quality standards or guidelines [12][13][14]16,23,[25][26][27] and (iii) statistical methods. The use of water quality standards or guidelines facilitates sub-division of sub-index values and provides more information for the users [12]. Several procedures to calculate WQI directly from the parameters without transforming them into a common scale. For instance, the CCME-WQI development process [20] uses a specific mathematic equation for directly aggregating the index.

Estimating weights
The weights are assigned to the selected parameters concerning their relative importance and their influence on the final index value [2,4]. The weights of the parameters can be either equal or unequal. A few of WQIs used equal weights in the calculation [13,14,20,23,[28][29][30]. Many WQIs were calculated with unequal weights. The weights assigned to the parameters were commonly defined by either participatory-based procedure such as Delphi method [1] or Analytical Hierarchy Process [31], or multivariate statistical analysis, mainly PCA and FA. To avoid subjective judgment from experts in the participatory-based procedure, the index developers suggested using PCA and FA to define parameter weights by different approaches [11,22,24,[32][33][34]. Exploratory factor analysis (FA) is a dimension reduction method, similar in some respect to PCA, though different enough from PCA that the two should not in any real way be considered equivalent [35]. In practice, PCA is a relatively simple technique when compared to FA. With factor analysis, since there are so many options and complexities, the outcome of the procedure for any analysis may be different, depending on how many factorsremained solutions [35,36]. A big deal for FA is the non-uniqueness of loadings. This means that how well a given variable load onto a given factor often depends on how many factors were extracted in the factor analysis [35,36]. Other than FA, from PCA results, a given variable loading onto an extracted principal component is unique [35]. This means that the variable loadings obtained from PCA reflect intrinsic and actual influence or importance of the variables to the water body under study. Thus, a comprehensive and unique approach based on only PCA results to define the weights of water quality parameters is necessary for WQI development.

Aggregating the sub-index values into final WQI
Index aggregation is conducted after the assignment of weights to obtain the final WQI value. The two most common methods to aggregate the sub-indices are the additive (arithmetic) and multiplicative (geometric) methods. There are also other modified versions of the two methods [2,4]. The mixed aggregation methods (combination of additive and geometric methods) are proposed by some researchers [16,23,30]. The multiplicative method which is shown in Eq (1) has been adopted for final aggregation in many WQIs [1,11,18,37,38].
Where n is number of selected parameters for WQI calculation; q i and w i is sub-index and weight of the i parameter, respectively.
The aggregation method to create the final WQI value must be selected so that it avoids problems of eclipsing and ambiguity [2]. The eclipsing arises wherein the final index value does not represent the actual state of overall water quality as the lower values of one or some sub-indices are dominated by the higher values of other sub-indices or vice versa. The ambiguity occurs wherein actual water quality is good, but final WQI answers to be bad or vice versa [4,17,19,39,40].
With the aim at developing a comprehensive and simple WQI procedure, using available monitoring data, this study is based on the following approaches: (i) a mixed system is used in parameter selection (basic and additional parameters); (ii) PCA is applied to estimate relative weights of parameters; (iii) Sub-indices are determined based on linear equations that are derived from national water quality guidelines; (iv) multiplicative formula is used as an aggregation method to calculate final WQI. This WQI procedure then is applied to Huong river in Thua Thien Hue province, Central Vietnam.

Study area
Hue City (belonging to Thua Thien Hue province) was the ancient capital of Vietnam under the governing of the Nguyen Dynasty lasted from 1802 to 1945 and had been the political and cultural center in Central Vietnam since then. It is the noted sight-seeing resort that was registered as a World Culture Heritage since 1993. Huong river with a catchment area of 2830 km 2 and a population of 540,000 in its basin is formed from two branches (Ta Trach and Huu Trach) originating from the mountains in the west of the province and combining at Tuan confluence. The main part of the river with 32 km length divides the city into two parts on its flowing way: north part (old city) and south part (new city), and meets Bo river at Sinh confluence (far from Hue city 15 km West), finally goes to Tam Giang-Cau Hai lagoon (running along the seaside) and then to the East sea at Thuan An outlet (Fig 1). The average width and depth of the main river part are 200 m and 2-8 m, respectively. Binh Dien hydro-power plant with a capacity of 423.7 million m 3 , located upstream of Huu Trach branch, has been operated since 2009. Ta Trach reservoir, with a capacity of 646 million m 3 , located upstream of Ta Trach branch, has been built for flood control purpose since 2013. A damp (Thao Long damp) has been built at the mouth area of the river in 2006 to prevent saline intrusion from the sea via the lagoon. Huong river is the most important surface water source used for different activities such as domestic activities, industries, irrigation, navigation, tourism, aquaculture, etc. in the province. Van Nien and Gia Vien are now two water intakes for two water treatment plants in the city. Wastewaters discharged into the river, floods in the wet season (September-December), and saline intrusion in the dry season (January-August) are environmental concerns to the river basin. Air temperature in the province is in the range of 21-38˚C and 24.8˚C on average. The annual average rainfall in the province is from 2700 mm to 3800 mm annually with a predominance of 60% in wet season. The river average flow was from 428 m 3 /s (in the dry season) to 553 m 3 /s (in the wet season), responding to the median flow from 189 m 3 /s to 214 m 3 /s, respectively (calculated from monitoring data in the years 2014-2016).

Collection of water quality data
The water quality dataset used in this study is a seven-year monitoring data (2014-2020). It was divided into two sets: the dataset of the year 2014-2016 was used for WQI procedure development, while the dataset of 2017-2020 was employed for testing the WQI procedure developed and assessing the water quality of the Huong river. The water quality monitoring program was performed by the Institute of Natural Resources, Environment, and Biotechnology (IREB), Hue University, under the support of the Ministry of Training and Education, Vietnam. The water quality data were in the form of monthly data in reference to surface water samples collected every month at 11 monitoring sites (Hto, HT, Tto, TT, SH1 -SH3, and SH5 -SH8 shown in Fig 1 over a period of 3 years (2014-2016). Fourteen parameters that were routinely monitored were: temperature, pH, electrical conductivity (EC), total suspended solids (TSS), dissolved oxygen (DO), 5-day-biochemical oxygen demand (BOD), chemical oxygen demand (COD), ammonium (N-NH 4 ), nitrate (N-NO 3 ), phosphate (P-PO 4 ), total coliform (TC), total dissolved iron (Fe), the river velocity and flow rate. Several total dissolved heavy metals (Hg II , Cd II , As III,V , Cr VI , Pb II , Cu II , Zn II ) and organochlorine pesticides (DDTs, HCHs) were monitored one or two times per year.
The river water quality has also been quarterly monitored (in February, May, August and November) at ten sampling sites (HT, TT, and SH1 -SH8, Fig 1) by the Center for Natural

PLOS ONE
Resources and Environment Monitoring (CREM) under the support of Thua Thien Hue Province-People Committee in the year of 2017-2020. The monitored parameters were the same as mentioned above.
Analytical methods for water quality parameters were adopted from Standard Methods for the Examination of Water and Waste Water [41]. Quality assurance and quality control procedures were conducted during the monitoring or analysis to confirm the data quality. Quality control consists of revising repeatability, trueness, linearity, limit of detection (LOD) and blank were routinely undertaken to confirm confidence of the monitoring/analysis results [41].

Procedure of WQI development
The procedure of WQI development conducted in this study is described in Scheme 1.
• Parameter selection: Ten basic parameters (pH, EC, TSS, DO, BOD, COD, N-NH4, N-NO3, P-PO4, TC) and one additional parameter (Fe) were selected for the river WQI development. The parameters pH, EC, TSS and DO presents physical characteristics of the river. The parameters BOD, COD and N-NH4, N-NO 3 , P-PO 4 indicates organic pollution and eutrophication levels of the river, respectively. The parameter TC describes fecal bacteria pollution level of the river. Iron is commonly occurred in the river waters due to erosion and washing from the soil in river basins and therefore, it is selected as an additional parameter in the WQI model.

PLOS ONE
data) were very low, i.e. lower than the detection limit (LOD) or much lower than the limits of national guidelines on surface water quality [42] set up by Vietnam Ministry of Natural Resources and Environment/MONRE. The data set of the 11 parameters collected from IREB in the years 2014-2016 was used for the river WQI development. The original data set of 11 water quality parameters is supplied in S1 Data.
The data set of the 11 parameters (n = 11) collected from CREM in the year 2017-2020 (S2 Data) was used for testing the proposed WQI model and assessing the river water quality.
• Estimation of weights: Principle component analysis method can ideally reduce the dimensionality of a multivariate data set while still maintaining its original structure to the maximum extent possible and thus it is often used while dealing with environmental data. The PCA reduces the total number of original variables to a smaller data set of new variables (factors or components) while preserving the variability with a minimal loss of information. The PCA method helps to extract the components/factors from the correlation matrix, necessary to explain the variance structure through linear combinations of the original variables [35]. For the PCA calculation, original variables are commonly transferred to normalized variables, which have zero mean and unit variance, to remove the effects of the variable unit and scale [35]. The eigenvalue of each component (or factor) is the amount of variance in the data set which is accounted for (or explained) by the component. The PCA calculation also gives the factor loading for each variable. Each factor loading represents the degree of contribution of the variable to the formation of the factor. The variables with the highest factorial load are considered of greater importance and should influence more on the factor [11,35]. In this study, the communality, which is a sum of square loadings of retained principal components (PCs) for each variable, was used for the calculation of the weight in the WQI procedure. The variable with the highest communality is considered of the most importance and vice versa. The PCA calculations were performed by using the free software R, version 4.0.3/64-bit (10-10-2020), module R-Studio and package Factoextra (version 1.0.7).

• Determination of sub-index values:
For convenience to WQI users in defining the sub-index of each selected parameter (or variable), linear sub-index functions are established based on the permissible limits from the National Technical Regulations on Surface Water Quality (QCVN 08:2015-MT/BTNMT) [42] set up by Vietnam Ministry of Natural Resource and Environment (MONRE) in 2015. The linear functional form for each variable (x) is: where y (or q) is sub-index calculated from the monitored concentration of the variable x; a and b are derived from the two linear equations: where y = 100 corresponding to the best quality for variable x (� the limit of class A1 indicated in the regulation); where y = 1 corresponding to the worst quality for variable x (� the limit of class B2 in the regulation).
The water quality limits regulated for the selected parameters extracted from QCVN 08:2015-MT/BTNMT are shown in Table 1.
The DO concentrations higher than saturation indicate over algal synthesis in eutrophic waters, leading to a reduction in water quality. Saturated DO concentration at 20˚C and the air pressure of 760 mmHg is 9 mg/L. This means that the sub-index (y) equals to 100 for the DO concentrations in the range of 6-9 mg/L (i.e. from the limit A1 to saturation with accepting that the lowest river water temperature was 20˚C). In case the DO concentration is lower than 6 mg/ L, the sub-index linear function for the parameter DO is determined following Eqs 3 and 4. If the DO concentration is over 9 mg/L (over saturation), a and b are derived from two equations: The pH limits in class A1 and A2 stated in the regulation range from 6 to 8.5, responding to the sub-index of 100. In the case of pH lower than 5.5 (limit B1) or higher than 9 (limit B2), the sub-index is equal to 1. This means that there are two sub-index functions for the parameter pH. Due to the parameter EC is not regulated in the QCVN 08:2015-MT/BTNMT [42], the sub-index linear function for the EC is established based on the limits for the parameter TDS required in the other regulations with approximately accepting that [43].

TDS mg=L
According to National Technical Regulations on Drinking Water Quality (QCVN 01:2009/ BYT) [44] set up by the Vietnam Ministry of Health, the limit for TDS is lower than 1000 mg/L,  [45] set up by Vietnam MONRE, the limit for TDS is lower than 2.000 mg/L, approximate to EC < 3077 μS/cm that responds to the sub-index of (y) of 1. This means that in the sub-index linear equation for the EC, a and b are derived from two equations: • Aggregation of the sub-index values into final WQI: Multiplicative method using formula Eq 1 mentioned above to calculate final WQI. Where, q i is the parameter sub-index, ranging from 1 (the worst quality) to 100 (the best quality); w i is the parameter weight defined from the PCA procedure, ranging from 0 to 1; sum of the weights equals to one.
The proposed WQI was then applied to evaluate the river water quality employing the dataset in the years 2017-2020. The river water quality evaluations resulting from the proposed WQI were compared with the NSF-WQI and VN-WQI in several critical cases (the parameter concentrations above or below the limits) to examine ambiguity and eclipsing of the WQI indices in the river water quality reflection. The NSF-WQI is an index calculated according to either multiplicative formula (Eq 1) or additive one (Eq 2) with nine selected parameters (n = 9) consisting of temperature change (ΔT), pH, Tur (turbidity), TS (total solids), DO, BOD 5 , N-NO3, P-PO4 and fecal coliform (NSF-WQI, 1970). It includes the parameter weights: In this study, the NSF-WQI was calculated according to both the formulas (Eqs 1 and 10). The original data set of the nine water quality parameters mentioned above and the results obtained from the NSF-WQI calculation are supplied in S3 Data. The parameter subindex (q i ) was derived from the respective rating curve. DO concentration (mg/L) at a given water temperature (extracted from S2 Data) was converted into DO saturation (%) to define the subindex for parameter DO. The parameter ΔT was obtained by subtracting the upstream temperature from the temperature downstream and recording the result as temperature change (˚C). The parameter TS was accepted to be the sum of TDS and TSS: TS = TDS + TSS, where TDS (total dissolved solids) concentration was estimated by: TDS (mg/L) = 0.65 × EC (μS/cm); the parameters EC and TSS were extracted from S2 Data. Fecal coliform concentration was replaced by the total coliform (TC) concentration for the NSF-WQI calculation. The relative weights for the parameters (w i in parenthesis) are as follows (in decrease order of the The VN-WQI is an index without the parameter weight, meaning that the selected parameters have equal weight (weights are all equal to one). The sub-index value for each parameter is defined from the normalized scales given in the appropriate table. The sub-index for the parameter DO is derived from a given equation with monitored water temperature. The final VN-WQI value is calculated with both multiplicative and additive methods (the VN-WQI model is supported in S1 Text). In this study, the index VN-WQI applied to the river was calculated from eight parameters (n = 8): pH (belongs to Group I); DO, BOD, COD, N-NH4, N-NO3 and P-PO4 (Group IV) and TC (Group V). The heavy metals including As, Cd, Pb, Cr VI , Cu, Zn, Hg (Group III) and organochlorides such as aldrin, BHCs, dieldrin, DDTs, heptachlor and heptachlor epoxide (Group II) were not selected for the VN-WQI calculation because their concentrations monitored in the river samples in the years 2017-2020 were lower than the detection limits (LODs) or much lower than the limits regulated by Vietnam MONRE (QCVN 08-MT:2015/BTNMT) [42].

Application of principal component analysis to define weights
Arief et al. [4] recommended a minimum of 150-300 cases to be studied for principal component analysis (PCA) and factor analysis (FA) to achieve reliable results. This study satisfies this criterion as it uses monthly data of the 11 parameters at 11 monitoring sites in three years (2014-2016) i. e. 396 cases (= 11 × 12 × 3).
Descriptive statistics, processed from Microsoft-Excel using Real Statistics tool, are described in Table 1. The National Technical Regulation on Surface Water Quality set up by Vietnam MONRE in 2015 (QCVN 08-MT:2015/BTNMT) [42] is also included in Table 1 to indicate the permissible limits of the parameters that are used for establishing the linear subindex functions. These results are also used for a preliminary overview of the river water quality which will be discussed in the next sections.
The PCA procedure was performed on the Pearson correlation matrix of the 11 selected variables, extracting 11 new components with their own eigenvalues. The criterion to decide the number of components to be retained is adopted from the previous WQI developers [11,24,46]. Ideally, the retained components should have the following characteristics: (i) Cumulative contribution to the overall variance is greater than 60%; (ii) Associated eigenvalues are higher than one. The component eigenvalue higher than one should be retained as it explains at least more one original variable in the data set; If below 1, the new component does not provide more information than the original variable and, therefore, is of little interest [24,35]. Table 2 presents the eigenvalues from the PCA, the percentage of variance explained by each component and the cumulative variance. The cumulative variance for the first three (3) principal components (Comp.1 -Comp.3), which is equal to 67.0%, satisfies the recommendations and was adopted to use for the calculation of the parameter weights in the proposed WQI in the present work. The 33% of the remaining total variance of the data was assigned to 'noise' or background variation.
The PCA outputs helped evaluate the variable level of explanation relevant to the analysis, meaning which variables are responsible for the patterns seen among the observations. The factorial load from the PCA is the correlation of the variable with the respective component. A positive value of the factorial load demonstrates a positive correlation with the component of the variable. If it is negative, this correlation is negative. In other words, the variable has a direction of variation opposite to that of the construct. Table 3 shows factor loadings of the variables on the first three principal components (PC1 -PC3). The loading plots for PC1 × PC2 and PC2 × PC3 are shown in Fig 2. The results in Table 3  . This classification was adopted by Ouyang [48] and Singh et al. [49]. Thus, the PC1 accounts for the nine variables related to water quality that emerged with strong to moderate loadings (higher than ± 0.5). The TSS and EC variables had very weak loadings on PC1, accounting for 0.168 and 0.257, respectively. Most of these nine variables have positive correlations with the PC1, except for variables pH and DO having negative correlations (opposite variation directions again the positive direction of the PC1).
(ii) PC2 explains 12.8% of the total variance of the data and mainly accounts for two (2) variables with negative correlation: TSS (-0.740) and pH (-0.564).
(iii) PC3 explains only 9.4% of the total variance of the data and mainly accounts for two (2) variables: EC (0.796) and TSS (-0.519).
The next step for the WQI formulation is to define the degree of relevance of each variable (or parameter) that helps establish the relative weight (w i ). From factor loading values in Table 3, the squared loadings and then the communality values, which represent the amount

PLOS ONE
of variance explained by each variable in the factorial solution, are calculated. Table 4 presents the squared loadings and communality values for the variables on three principal components (PC1 -PC3). The largest communality value in the column is for the parameter EC (0.875), providing the greatest relative weight (w i ) and the smallest communality value for Fe (0.452), giving the smallest relative weight. Then, the procedure to define the relative weight (w i ) of each parameter is easily conducted by dividing its communality value by the sum of the communality values in the column (7.374). Using the communality values and the procedure defined in this study, the relative weight (w i ) for each parameter is calculated and exhibited in Table 4. The sum of the eleven weights adds to one (1.00). Thus, the PCA helped to define the weight of importance for each parameter, independent of subjective assessments. The next step is to transform the concentration monitored for each parameter, into dimensionless grade (sub-index q i ), to calculate the WQI value for each water sample.

Linear functions to transform dimensional water quality parameters into dimensionless sub-indices
Linear curves with the monitored concentrations of the parameters in the abscissa and the grades (sub-indices q) ranging from 1 to 100 in the ordinate were developed using the limits for surface water quality regulated by Vietnam MONRE (QCVN 08-MT:2015/BTNMT [42], shown in Table 1) and the procedure described above. Fig 3 shows the curves (concentration

Application of the WQI to Huong river in Thua Thien Hue province
In the period 2014-2020, there has been no publication on WQI development to assess the quality of Huong river. The proposed WQI index was, for the first time, applied to evaluate Huong river water quality in the period of 2017-2020. The final WQI values were calculated using the multiplicative formula with the respective weights and sub-indices (Eq (2)). The results of the WQIs were shown in Table 5. The calculations presented in the spreadsheet (the river water quality data set in the years 2017-2020 with total data of 1980 (180 cases × 11 variables) (S2 Data) indicated 96.6% of the set had concentrations below the A1 limit (89.1%) and A2 limit (7.5%); 3.2% of the set had concentrations above the B1 limit (2.4%) and B2 limit (0.8%); and 0.2% of the set had concentrations above B2 limit. Based on these results, it is expected that around 97% of WQI values were of grades EXCELLENT or GOOD and around 3% of grades MODERATE or POOR. These results are quite the same from the river WQI values: 97.8% of WQI values were of grades EXCELLENT or GOOD and 2.2% of grade MODERATE.
Generally, the river water quality was rather good in terms of the WQI: 97.8% of grades EXCELLENT or GOOD. Discharging water from the Ta Trach reservoir into the river in the  Table 6). Besides, rather high concentrations of the total coliform (TC) for the site SH5 in Aug. 2019 (15000 MPN/100 mL, above the limit B2) and site SH6 in Nov. 2020 (4600 MPN/100 mL, above the limit A2) also contributed to the decrease in the WQI values (= 66, appropriate to the grade MODERATE). These results indicated that the proposed WQI index was a sensitive reflection of the river water quality. For comparison, the index NSF-WQI and VN-WQI were also calculated for the monitoring session in Nov. 2020 (also shown in Table 6).
The results from Table 6 show that compared with the proposed WQI, the NSF-WQI M and NSF-WQI A values are remarkably lower. The reason for that is the relative weights for parameters DO and TC in the NSF-WQI are higher than that in the index WQI. Although there are four of eight cases that the water quality grades from the NSF-WQI A and proposed WQI are the same, the values of the two indexes are significantly different (p = 0.044; paired-t-test). In addition, the differences in the river water quality reflection between the NSF-WQI and the proposed WQI occurred due to differences in the selected parameters and number of the parameters incorporated in the indexes. Collating the results of these indexes (NSF-WQI M , NSF-WQI A and the proposed WQI) with the values monitored for the parameters in comparison with the limits from Vietnam MONRE regulations, the proposed WQI index is more suitable in the river water quality assessment. Also, compared with the VN-WQI, the proposed WQI has no ambiguity and eclipsing due to representing the actual state of overall water quality. The reason for the less representative of the VN-WQI is that the parameters TSS and Fe are not integrated into the VN-WQI calculation. Another issue of the VN-WQI is that it does not reflect the impact of saline intrusion on the water quality because the parameter related to dissolved solids such as EC or TDS is not integrated into the index.

Conclusion
A comprehensive and simple procedure to develop the WQI using the available monitoring data of Huong river water quality was proposed. Multivariable technique (PCA) was applied to objectively define relative weight (w i ) for each water quality parameter, based on the set of communality values for the 11 selected parameters. The use of the limits from the national guideline on surface water quality for establishing the linear functions to transform the dimensional concentration into dimensionless sub-index (q i ) for each parameter provided convenience for the WQI users. The multiplicative formula which operates the sub-index (q i ) raised to a power (w i ), or the weight of importance of each variable, allowed to calculate the final WQI values. Comparison between the river water quality evaluations resulting from the proposed index (WQI), with the index NSF-WQI and index issued by Vietnam Environment Agency (VN-WQI) in 2019 indicated the different classifications using the three indexes. The representative reflection of the actual state of the river general water quality in term of the WQI shows that the WQI avoided ambiguity and eclipsing occurred to the VN-WQI. Finally, the developed procedure and WQI could be used for the river quality assessment in the coming years as well as for practical applications on a local or regional scale.